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1.  Introduction 


There  have  been  a  number  of  studies  on  Poisson  approximations  for 
sums  of  uniformly  small  random  variables.  Of  paramount  interest  is  the 
total-variation  distance  between  a  sum  of  random  variables  and  a  Poisson 
variable.  The  total-variation  distance  between  two  probability  measures 
(distributions)  F  and  G  on  some  measurable  space  is  defined  by 

(1.1)  d(F,G)  =  supB|F(B)  -  G(B)|, 

where  the  supremum  is  over  all  measurable  sets  (2d(F,G)  is  the  total 
variation  of  the  signed  measure  F-G).  The  total-variation  distance 
between  random  elements  X  and  Y  with  the  respective  distributions  F  and  G 
is  d(X,Y)  =*  d(F,G). 

Building  on  the  works  of  Hodges  and  Le  Cam  (1960),  Le  Cam  (1960), 
Franken  (1964)  and  Freedman  (1974),  Serf ling  (1975)  proved  this  result: 

If  X^,...,X  are  non-negative  integer-valued  random  variables  adapted  to 
the  increasing  o-fields  *  then 

n  n  2 

(1.2)  d(  Z  X  ,N)  <  Z  [E  vp  )  +  E | p .  -  Ep. |  +  P(X.  >2)], 

i-1  i-1  1  1 

where  p^=  P(X^  *  l|  F^_^)  and  N  is  Poisson  with  mean  E^^Ep^.  Comparable 
bounds  for  other  Poisson  approximations  appear  in  Barbour  and  Gagleson 
(1983),  Brown  (1983),  Chen  (1975),  Kabanov  et  al.  (1983),  Kerstan  (1964), 
Valkeila  (1982)  and  their  references.  Such  bounds  are  useful  for  proving 
limit  theorems  for  random  variables  and  point  processes  as  well. 

In  this  paper,  we  present  analogues  of  (1.2)  for  compound  Poisson 
approximations  for  sums.  We  consider  sums  of  random  elements  that  take 


values  In  a  measurable  group  S:  the  group  operation,  addition,  is 
measurable.  If  X  is  a  random  element  of  S  with  the  compound  Poisson 


distribution  H(B)  -  E  Fn  (B)ane  a/n!,  then  we  say  X  is  CP(  a,F).  If  X 

n-0 

has  the  distribution  EH(»),  where  a  or  F  are  random,  then  we  say  X  is 
mixed  CP(a,F). 

Here  is  our  main  result.  Let  X^,...,Xn  be  random  elements  of  S 
adapted  to  the  increasing  o-fields  { ,  and  define 

Pi-  P(Xi  *  0|  Fi_1),  F^B)  *  P(X1  e  B|  *  0). 

Let  F  be  a  distribution  on  S  with  F({o})-  0,  and  define,  by  (1.1),  the 
random  distance  d^-  d(F^,F)  (F^  is  random  but  F  is  not). 


Theorem  1.  If  Z  is  mixed  CP(  E  p  ,F),  then 

i-1 


(1.3) 


d(  E  X  Z)  <  E[  E  (d  +  pj)]. 
i-1  i-1 


If  Z  is  CP(  E  a  ,F)  where  a,*  Ep.,  then 
i-1  1  1 


(1.4) 


d(  E  X  Z)  <  E  [  E  (d.  +  |p.  -  a.  |  +  a.)]. 
i-1  i-1  1  1 


If  Z  is  CP(  a,F) ,  then 


(1.5) 


n  n  ?  n 

d(  E  X  Z)  <  E  [  E  (d.  +  pf)  +  |  E  p  -  a|  ]. 
i-1  i-1  1  1  i-1  1 


This  result  says,  roughly,  that  E^^X^  is  approximately  compound 

Poisson  when  the  X^'s  are  rarely  nonzero  (the  p^'s  are  small),  and  given 

that  the  X  's  are  nonzero,  their  conditional  distributions  F,  , . . .  ,F 
i  1  n 


•  .N ■* *•  A « ■  ...  .  • 


»  *.  •  -  •  *.  ■  *  •  '  •  ‘ 


3 


are  nearly  identical.  Note  that  (1.5)  with  a  ■  ai  *s  different  from 

(1.4);  in  some  cases  the  bound  in  (1.4)  is  smaller  than  that  in  (1.5)  but 
in  other  cases  the  reverse  is  true. 

For  the  degenerate  distribution  F  on  R  with  unit  mass  at  1,  Theorem 
1  yields  bounds  for  Poisson  or  mixed  Poisson  approximations  for  sums.  In 
this  case,  (1.4)  is  the  same  as  (1.2),  and  (1.5)  is  consistent  with  the 
inequalities  of  Brown  (1983)  and  Kabanov  et  al.  (1983), 

which  were  established  by  martingale  techniques. 

Brown  (1983)  also  obtains  compound  Poisson  approximations  for 
certain  discrete  variables  via  Poisson  approximations.  This  approach, 
however,  does  not  apply  to  the  general  case.  We  prove  our  results  by 
rather  direct  arguments  based  on  judicious  conditioning  and  the  use  of 
(1.1)  as  a  random  distance  for  random  distributions.  Our  approach  also 
brings  to  light  the  key  role  of  the  F^'s. 

From  its  proof,  one  can  easily  see  that  Theorem  1  is  also  true  when 
the  number  of  variables  n  in  the  sum  is  a  stopping  time  of  { F^J .  For 
instance.  Theorem  1  applies  to  sums  of  the  form  where 

N(t)  =  £^I(t^  <  t)  and  <  T2  <  are  stopping  times  of  the  increasing 
o-fields  {F(c)|  and  F^  *  F(t^),  respectively.  Theorem  1  also  holds  when 

F  and  a  are  random;  the  Z  in  (1.4)  and  (1.5)  would  then  be  mixed  compound 
Poisson. 

The  rest  of  this  paper  is  organized  as  follows.  Section  2  gives 
some  basics  on  the  total-variation  distance,  Section  3  consists  of  the 
proof  of  Theorem  1,  and  Section  4  gives  an  example  for  Markovian 
occurrences  of  an  event. 


2.  Basic  Inequalities  for  Distances 


Let  X  and  Y  be  random  elements  of  some  measurable  space.  A 
well-known  coupling  inequality  is 

(2.1)  d(X,Y)  <  P(X  *  Y). 

The  X,Y  in  the  probability  are  the  random  elements  —  with  an  arbitrary 
dependency  —  defined  on  a  common  probability  space.  Inequality  (2.1) 
follows  because  P(XeB)  <  P(X*Y)  +  P(YeB). 

It  is  natural  for  us  to  analyze  d(X,Y)  in  terms  of  conditional 
probabilities.  Accordingly,  we  sometimes  refer  to  X  as  having  a 
distribution  EF(»)  where  F  is  a  random  distribution.  Typically, 

F(B)  *P(X  e  BjF),  or  F  could  be  defined  as  a  measurable  function  of 
random  elements. 

Lemma  2.1.  Suppose  X  and  Y  have  the  respective  distributions  EF(  •)  and 
EG( •) ,  where  F  and  G  are  random  distributions.  Then 

(2.2)  d(X,Y)  <  E [d(F,G) ] . 

In  case  F(B)  *  P(XeB|F)  and  G(B)  =  P(YeB|G),  for  some  o-fields  F  and  G, 
then 

(2.3)  d(X,Y)  <E[d(F,G)]  <E[P(X  *Y|F,G)]. 

Proof .  Expression  (2.2)  follows  since 

d(X,Y)  -  supB|EF(B)  -  EG( B) J  <  supBE|F(B)  -  G(B)|  =  E [d(F,G) ] • 

Expression  (2.3)  follows  from  (2.2)  and  a  random  version  of  (2.1). 

Remark.  Keep  in  mind  that  F,G  in  the  expectation  in  (2.2)  are  the  random 
distributions  on  a  common  probability  space  and  their  dependency  is 
arbitrary.  A  similar  comment  applies  to  the  X,Y,F,G  in  the  probability 
in  (2.3). 
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Distances  involving  functions  of  random  elements,  such  as  sums  or 
maxima,  can  generally  be  represented  as  D  *  d(h(X) ,h(Y)) ,  where 
X  *  (X^,...,Xn>,  Y  *  (Y^,...,Yn)  and  h  is  a  measurable  function  from  the 
range  space  of  X  and  Y  to  some  other  measurable  space.  Here  are  some 
bounds  on  this  distance. 

n 

Lemma  2.2.  (i)  D  <  d(X,Y) .  (ii)  D  <  Z  P(X  *Y  ). 

i=l  1  1 

(iii)  If  X^,...,Xn  are  independent  and  Y^,...,Yn  are  independent, 

n 

then  D  <  Z  d(X  ,Y  )  . 
i=»l  1 

(iv)  If  X^,...,Xn  are  adapted  to  the  increasing  cr-fields  and 

Y^, . . •  ,Y^  are  adapted  to  the  increasing  o-fields  {G^  >  and 

F±(B)  =■  P(X±  e  B|F1-;l),  G1(B)  -  P(Yi  e  B|Gi_1),  then 

n  n 

(2.4)  D  <  e[  I  d(F  G  )]  <  E[  Z  P(X  *  Y  |  F  ,  G.  .)]. 

i-1  i-1  1 

Proof.  Statement  (i)  is  true  since 

D  -  supB|p(h(X)  e  B)  -  P(h(Y)  e  B) | 

“  supB|P(X  c  h_1(B))  -  P(Y  e  h_1(B))|  <  d(X,Y). 
Statement  (ii)  is  true  since  by  (i)  and  (2.1)  we  have 

n  n 

D  <  P(X  *  Y)  -  P(  U  {X  *  Y  })  <  Z  P(X  *  Y  ) . 

i-1  i-1  1  1 

Now  consider  (iii)  when  n*2.  From  (i),  the  triangle  inequality  for  d, 
and  the  independence,  we  have 

D  <  d((X1,X2),(Y1,Y2))  <  d((X1,X2),(Y1,X2))  +  d((Y1,X2),(Y1,Y2)) 


<  d(X. ,Y . )  +  d(X,,Y.) 
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Using  this  inequality  and  induction  yields  (iii)  for  general  n. 

Under  the  hypotheses  of  (iv),  it  follows  by  successive  conditioning 

that  P(XeB  x...xB  )  =  e[f  (B  )...F  (B  )1,  and  a  similar  statement  holds 
1  n  '•11  nnJ 

for  Y.  Then  using  (i),  (2.2)  and  (iii)  we  have 

n 

D  <  d(x,y)  <  E[d(F1...Fn,  G1...Gn)]  <  E  J  d(F.,G1). 

The  second  inequality  in  (2.4)  follows  from  (2.3). 

The  next  two  results  deal  with  compound  Poisson  distributions. 

Lemma  2.3.  If  X  is  CP(a,F)  and  Y  is  CP(3,G),  with  F({0})  =  0  and 
G({0})  =  0,  then  d(X,Y)  <  ]a  -  b|  +  (a  a  3)d(F,G). 

Proof.  First  consider  the  case  in  which  a  <  3.  Clearly  Y  is  equal  in 

distribution  to  Y^+  Y^,  where  Y^,Y2  are  independent  CP(  3~a,G)  and 

CP(a,G),  respectively.  Note  that  the  distributions  of  X  and  Y^  can  be 
N*  N* 

written  as  EF  (•)  and  EG  (•)  ,  respectively,  where  N  is  a  Poisson 
random  variable  with  mean  a.  Then  applying  the  triangle  inequality, 
(2.2),  (2.1)  and  Lemma  2.2  (iii)  in  the  form  d(Fn  ,Gn  )  <  nd(F,G),  we 
have 

M*  M* 

d(X,Y)  <  d(X,Y2)  +  d(Y2,YL  +  Y2)  <  Ed(F  ,G  )  +  0) 

<  ENd(F.G)  +  1  -  e~(6_a)  <  ad(F,G)  +  3  -  a. 

This  proves  the  assertion  when  a  <  3»  and  a  similar  proof  applies  when 
a  >  3. 

Lemma  2.4.  Suppose  X  is  a  random  element  of  S  and  let 
(2.5)  p=P(X  *  0)  and  F(B)=P(X  e  B | X  *  0). 

If  Z  is  CP(p,F),  then  d(X,Z)  <  p^. 

Proof .  It  suffices,  by  (2.1),  to  construct  X,Z  on  a  common  probability 

2 

space  such  that  P(X*Z)<p  .  To  this  end,  let  N,U  and  Y, ,...,Y  be 

1  n 

independent  random  elements  on  a  common  probability  space  such  that  N  is 


a  Poisson  random  variable  with  mean  p,  each  Y^  has  the  distribution  F, 
and  P(U  =  0)  =  (1  -  p)eP  =  1  -  P(U  =  1).  Define 

N 

X  =  Y .  (1  -  I(U  =  0,  N  =  0))  and  Z  =  E  Y  . 

i=l  1 

An  easy  check  shows  that  X  satisfies  (2.5),  and  Z  is  clearly  CP( p,F). 
Furthermore , 

P(X  *  Z)  =  P(X  *  Z,  N  =  0)  +  P(X  *  Z,  N  >  2) 

=  P(U  =  1)P (N  =  0)  +  P (N  >  2)  =  p(l  -  e~P)  <  p2. 

This  completes  the  proof. 

We  end  this  section  by  comparing  two  random  elements  that  have 
certain  conditional  distributions  that  are  equal. 

Lemma  2 .5 .  Let  X  and  Y  be  random  elements.  If  there  is  a  measurable  set 
A  such  that  P(XeB|XeA)  =  P(YeB|YeA)  for  each  measurable  B,  then 
(2.6)  d(X, Y)  <  |P(X  e  A)  -  P(Y  e  A) | . 

Proof.  Let  U,  V  and  W  be  independent  random  elements  on  a  probability 
space.  Assume  that  U  is  uniform  on  (0,1)  and  that  V  and  W  take  values  in 

Q 

A  and  A  ,  respectively,  and  their  distributions  are  P(V  e  B)  = 

P(X  e  B|X  e  A)  and  P(W  e  B)  *  P(X  e  B|X  e  AC).  Let  p  and  q  denote  the 
respective  probabilities  in  (2.6),  and  define  X  =  VI(U  <  p)  +  WI(U  >  p) 
and  Y  =  VI(U  <  q)  +  WI(U  >  q).  Clearly  X  and  Y  satisfy  the  hypotheses 
and,  moreover,  P(X  *  Y)  =  P(p  Aq<U  <  p  V  q)  =  |p-q|.  Thus  the 
assertion  follows  by  applying  (2.1). 

3.  Proof  of  Theorem  1 

In  addition  to  the  notation  of  Theorem  1,  we  let  G  (•)= 

P 

pF( •)+( 1-p) 6  ( •) ,  where  6  is  the  Dirac  measure  with  unit  mass  at  0,  and 


we  let  Y  be  a  random  element  with  distribution  E(G  *...*G  (•))  (recall 

P1  Pn 

that  is  random) . 


To  prove  (1.3),  consider  the  inequality 


(3.1) 


d(  E  X  ,Z)  <  d(  E  X  ,Y)  +  d(Y,Z). 
i=l  i=l  1 


By  the  use  of  successive  conditioning,  it  is  clear  that 


P(  E  e  B)  =  E[Fj*...*F^(B)],  where  F^(B)  =  P(X1  e  B|Fi_1). 

Note  that  F^(  •)  =  piF1(  •)  +  (l-p^fi^*),  and  so  d(F^,Gp  )  -  d(Fi>F)  =  dt 

Then  applying  (2.2)  and  Lemma  2.2  (iii),  we  have 

n  n 

(3.2)  d(  E  X  ,Y)  <  E[d(F!*...*F',  G  *...*G  )]  C  E(  Ed,). 

i-1  1  In  Pl  pn  i=1  i 

Similarly,  using  P(ZeB)  =  E [h  *...*H  (B)],  where  the  distribution  H  i 

P1  Pn  p 

CP( p,F) ,  and  applying  Lemmas  2.1,  2.2  (iii)  and  2.4,  we  have 


(3.3) 


d(Y, Z)  <  E  [d(G  *...*G  ,  H  *...*H  ) 


P1  Pn  P1  Pn 

n  n 

<  E[  E  d(G  ,  H  )  ]  <  E(  E  pj. 
i*l  Pi  pi  i=l 

Then  combining  (3.1)  -  (3.3)  yields  the  assertion  (1.3). 

n 

Now  consider  the  assertion  (1.4).  Here  Z  is  CP(  £  a  ,F).  Let 

i=l  1 

U^,...,Un  be  independent  random  elements  with  the  respective 

distributions  G  . G  .  Then  by  applications  of  (3.2),  Lemmas  2.1, 

°1  °n 

2.2  (iii)  and  2.5  (with  A  =  S  \  {o}),  we  have 


f.. ’ 


c*  «  *  •' .  *■.  •'  .* 


n  n  n  n 

d(  Z  X  ,Z)  <  d(  I  X  ,Y)  +  d(Y,  Z  U  )  +  d(  Z  U  ,Z) 
i-1  i-1  i-1  i-1 


n 

<  E(  Zd.)  +  E [d(G  * 
i=l  1  P1 


G  * 


+  d(G  * 
a. 


H  * 
“l 


.*H  ) 

a 


n 

<  E[  Z  d  + 
1=1 


Finally,  to  prove  (1.5),  consider  the  inequality 
n  n 

(3.4)  d(  Z  X  Z)  <  d(  Z  X  ,Z»)  +  d(Z',Z), 

i=l  i-1 

n 

where  Z  is  CP(a,F)  and  Z*  is  mixed  CP(  Z  p,,F).  By  Lemmas  2.1  and  2.3  we 

i-1 

n 

have  d(Z' ,Z)  <  E|  Z  p  -  a].  Applying  this  and  (1.3)  to  (3.4)  yields 
i-1 

(1.5) . 

4.  A  Compound  Poisson  Approximation  for  Markovian  Occurrences  of  an 
Event 

Suppose  that  Yq  ,  Y^ , . . .  is  a  Markov  chain  with  states  0  and  1  that 
represent  the  non-occurrence  and  occurrence,  respectively,  of  a  certain 
event  E.  Let  e  =  P(Y^  -  1|Yq  *  and  p  *  1|yq  =  *  and  assume 

that  e  and  p  are  not  zero  or  one.  The  stationary  distribution  of  this 
Markov  chain  is 

n(0)  -  (1  -  p)/(l  -  p  +  e),  tt(  1)  =  e/(l  -  p  +  e) . 
Consequently,  when  e  is  small,  then  the  event  E  is  rare. 

Consider  the  sum  -  Z^=^Y^,  which  is  the  number  of  occurrences  of 


the  event  E  in  time  n.  We  assume,  for  simplicity,  that  the  Markov  chain 


is  stationary.  Isham  (1980)  and  Boker  and  Serfozo  (1983)  showed  that  if 
e  varies  with  n  such  that  e  -*■  0  and  ne  +  a  >  0  as  n  +  «,  then  N 

n 

converges  in  distribution  to  a  random  variable  Z  that  is  CP(®»F)  with 
F({k})  *  pk  '*"(1  -  p),  k  >  1.  A  bound  on  the  rate  of  this  convergence  is 
given  in  the  following  result.  Brown  (1983)  obtained  a  variation  of  this 
bound  by  another  approach. 

Theorem  4.1. 

(4.1)  d(Nn,Z)  <  |ne  -  ct|  +  e(l  +  p  +  en(2  -  p))/(l  -  p  +  e) . 

Proof.  Define  the  random  variables 
00 

Xi  “  J  “  Yi-l^Yi'“  Yi+k-l^1  "  Yi+k)*  i=1*“*>n» 
k=l 


X!  -  E  kY  Y 
k=l  L 


V1  *  W  • 


When  the  Markov  chain  begins  a  sojourn  in  state  1  at  time  i  (a  success 
run  of  the  event  E),  then  records  the  length  of  that  sojourn. 
Clearly 

v  «xi  » »h0 . Yi-i) 

00 

-  £  (1  -  Y1.1)epH(l  -  P)  -  e(l  -  Yt-1) , 

k=l 

Fi(k):=  P(X4  <  k|YQ, ... »Y1_1,  Xi  >  1)  -  F(k)  . 

Let  T  “  e”  -X.  and  T*  *  T  +  X! ,  and  consider 
n  i=«l  i  n  n  1’ 

(4.2)  d(Nn,Z)  <  d(Nn,r)  +  d(r,  Tn)  +  d(Tn,Z). 


Clearly 

(4.3) 


d(N  ,T’)  <  P(N  *  T’)  -  P(Y  -  1,  Y 


1)  -  it(Dp, 


(4.4)  d ( T '  ,  T  )  <  P(X*  *  0)  =  P(Y.  =  1)  =  ir(l), 

n  n  I  1 

and  by  (1.5) 

n  -  n 

(4.5)  d(T  ,Z)  <  E  Epf  +  E|  Z  p.  -  a| 

i=>l  i=l 

2  n 

=  ne  tt(0)  +  E|e  E  (1  -  Y.  .)  -  a| 
i=l 

2 

<  ne  tt(0)  +  emr(l)  +  |ne  -  a|  . 

Combining  (4.2)  -  (4.5)  yields  (4.1). 

Remark .  Note  that  the  preceding  proof  applies  (1.5)  to  the  auxiliary  sum 

instead  of  to  the  original  sum  N^.  One  could  also  apply  (1.4)  to 

2 

but  this  would  yield  (4.1)  with  2  -  p  replaced  by  (2  -  p)  ,  which  is 


worse. 
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